Below is a scatterplot comparing GDP per capita with CO2 emissions in metric tons per capita in the year 1962. The correlation between the two variables and its associated p-value are depicted on the plot.

## Warning: Removed 151 rows containing non-finite values (`stat_cor()`).
## Warning: Removed 151 rows containing missing values (`geom_point()`).

Below is the year in which the correlation between GDP per capita and CO2 emissions in metric tons per capita was the strongest.

## [1] 1967

Below is an interactive plot depicting GDP per capita and CO2 emissions in metric tons per capita in 1967, where the size of each point is determined by the size of the corresponding nation’s population and the points are color coded by continent.

## Warning: Removed 146 rows containing non-finite values (`stat_cor()`).

Below we will investigate the relationship between continent and energy use (kg of oil per capita) spanning the entirety of the dataset. Given that continent is a nonbinary, categorical variable and energy use is a quantitative variable, we will start with an ANOVA test for statistical significance.

## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:plotly':
## 
##     select
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
## 
##     geyser
##              Df    Sum Sq   Mean Sq F value Pr(>F)    
## continent     4 7.715e+08 192870621   51.46 <2e-16 ***
## Residuals   843 3.160e+09   3748033                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 436 observations deleted due to missingness

Given that the p-value is below 0.05, at a standard confidence level of 95%, we have determined that there is some statistically significant relationship between continent and energy use in this dataset. For more specific understanding of that relationship, we will now use Tukey HSD testing for pairwise comparisons of energy use between each two countries.

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = `Energy use (kg of oil equivalent per capita)` ~ continent, data = withCountry)
## 
## $continent
##                       diff       lwr       upr     p adj
## Americas-Africa  1005.1037  466.8326 1543.3748 0.0000041
## Asia-Africa      1168.7636  628.2529 1709.2742 0.0000000
## Europe-Africa    2447.5453 1947.3838 2947.7067 0.0000000
## Oceania-Africa   3281.7976 2040.3410 4523.2543 0.0000000
## Asia-Americas     163.6599 -384.4160  711.7357 0.9256447
## Europe-Americas  1442.4416  934.1141 1950.7691 0.0000000
## Oceania-Americas 2276.6940 1031.9249 3521.4630 0.0000069
## Europe-Asia      1278.7817  768.0833 1789.4801 0.0000000
## Oceania-Asia     2113.0341  867.2950 3358.7732 0.0000402
## Oceania-Europe    834.2524 -394.5176 2063.0223 0.3421942

As seen above, there are two pairs with a p-value > 0.05: Asia-Americas and Europe-Oceania. This observation supports a three-way clustering of energy use by continent in which one cluster consists solely of Africa, the other consists of Asia and America, and the other consists of Europe and Oceania. Within-cluster similarity is illustrated by the below boxplot.

## Warning: Removed 436 rows containing non-finite values (`stat_boxplot()`).

Below we will determine whether there is a significant difference between Europe and Asia with respect to import goods and services as a % of GDP. Given that we are using a binary subset of the categorical continent variable for prediction and that import goods and services as a % of GDP is a quantitative variable, t-testing will be used.

## # A tibble: 1 × 1
##     pval
##    <dbl>
## 1 0.0412

Given that the p-value is below 0.05, there is a significant difference between Europe and Asia with respect to import goods and services as a % of GDP.

Below are the two countries tied for highest average population density per sq km of land area across all years within the dataset.

## 
## Attaching package: 'dbplyr'
## The following objects are masked from 'package:dplyr':
## 
##     ident, sql
## # A tibble: 1 × 1
##   `Country Name`  
##   <chr>           
## 1 Macao SAR, China

Below is the country with the highest increase in life expectancy from 1962 to 2007.

## # A tibble: 1 × 1
##   `Country Name`
##   <chr>         
## 1 Maldives